Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 1139 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 115.8 KiB |
| Average record size in memory | 104.1 B |
Variable types
| Numeric | 13 |
|---|
fixed acidity is highly correlated with citric acid and 2 other fields | High correlation |
volatile acidity is highly correlated with citric acid | High correlation |
citric acid is highly correlated with fixed acidity and 2 other fields | High correlation |
free sulfur dioxide is highly correlated with total sulfur dioxide | High correlation |
total sulfur dioxide is highly correlated with free sulfur dioxide | High correlation |
density is highly correlated with fixed acidity and 1 other fields | High correlation |
pH is highly correlated with fixed acidity and 1 other fields | High correlation |
alcohol is highly correlated with density | High correlation |
fixed acidity is highly correlated with citric acid and 2 other fields | High correlation |
volatile acidity is highly correlated with citric acid | High correlation |
citric acid is highly correlated with fixed acidity and 2 other fields | High correlation |
free sulfur dioxide is highly correlated with total sulfur dioxide | High correlation |
total sulfur dioxide is highly correlated with free sulfur dioxide | High correlation |
density is highly correlated with fixed acidity | High correlation |
pH is highly correlated with fixed acidity and 1 other fields | High correlation |
alcohol is highly correlated with quality | High correlation |
quality is highly correlated with alcohol | High correlation |
fixed acidity is highly correlated with pH | High correlation |
free sulfur dioxide is highly correlated with total sulfur dioxide | High correlation |
total sulfur dioxide is highly correlated with free sulfur dioxide | High correlation |
pH is highly correlated with fixed acidity | High correlation |
df_index is highly correlated with density and 1 other fields | High correlation |
free sulfur dioxide is highly correlated with total sulfur dioxide | High correlation |
sulphates is highly correlated with citric acid and 1 other fields | High correlation |
pH is highly correlated with density and 3 other fields | High correlation |
density is highly correlated with df_index and 3 other fields | High correlation |
fixed acidity is highly correlated with df_index and 4 other fields | High correlation |
citric acid is highly correlated with sulphates and 3 other fields | High correlation |
total sulfur dioxide is highly correlated with free sulfur dioxide | High correlation |
alcohol is highly correlated with pH and 2 other fields | High correlation |
chlorides is highly correlated with sulphates and 1 other fields | High correlation |
df_index has unique values | Unique |
citric acid has 106 (9.3%) zeros | Zeros |
Reproduction
| Analysis started | 2021-05-22 10:57:58.603899 |
|---|---|
| Analysis finished | 2021-05-22 10:58:52.109624 |
| Duration | 53.51 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 1139 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 788.451273 |
| Minimum | 1 |
|---|---|
| Maximum | 1598 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 66.9 |
| Q1 | 375.5 |
| median | 773 |
| Q3 | 1193.5 |
| 95-th percentile | 1528.1 |
| Maximum | 1598 |
| Range | 1597 |
| Interquartile range (IQR) | 818 |
Descriptive statistics
| Standard deviation | 470.5202595 |
|---|---|
| Coefficient of variation (CV) | 0.5967651719 |
| Kurtosis | -1.22841138 |
| Mean | 788.451273 |
| Median Absolute Deviation (MAD) | 408 |
| Skewness | 0.03707365356 |
| Sum | 898046 |
| Variance | 221389.3146 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1 | 0.1% |
| 1064 | 1 | 0.1% |
| 1073 | 1 | 0.1% |
| 1072 | 1 | 0.1% |
| 1070 | 1 | 0.1% |
| 1069 | 1 | 0.1% |
| 1066 | 1 | 0.1% |
| 1065 | 1 | 0.1% |
| 1063 | 1 | 0.1% |
| 1076 | 1 | 0.1% |
| Other values (1129) | 1129 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 |
| Value | Count | Frequency (%) |
| 1598 | 1 | |
| 1597 | 1 | |
| 1595 | 1 | |
| 1594 | 1 | |
| 1593 | 1 | |
| 1591 | 1 | |
| 1590 | 1 | |
| 1589 | 1 | |
| 1588 | 1 | |
| 1587 | 1 |
| Distinct | 93 |
|---|---|
| Distinct (%) | 8.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.287884109 |
| Minimum | 4.6 |
|---|---|
| Maximum | 15.9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 4.6 |
|---|---|
| 5-th percentile | 6.1 |
| Q1 | 7.1 |
| median | 7.9 |
| Q3 | 9.2 |
| 95-th percentile | 11.7 |
| Maximum | 15.9 |
| Range | 11.3 |
| Interquartile range (IQR) | 2.1 |
Descriptive statistics
| Standard deviation | 1.725695601 |
|---|---|
| Coefficient of variation (CV) | 0.2082190796 |
| Kurtosis | 0.9416591529 |
| Mean | 8.287884109 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.8895824737 |
| Sum | 9439.9 |
| Variance | 2.978025308 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7.8 | 44 | 3.9% |
| 7 | 40 | 3.5% |
| 7.6 | 36 | 3.2% |
| 7.2 | 36 | 3.2% |
| 7.1 | 35 | 3.1% |
| 7.5 | 34 | 3.0% |
| 7.9 | 34 | 3.0% |
| 8 | 32 | 2.8% |
| 7.7 | 31 | 2.7% |
| 7.3 | 30 | 2.6% |
| Other values (83) | 787 |
| Value | Count | Frequency (%) |
| 4.6 | 1 | 0.1% |
| 4.7 | 1 | 0.1% |
| 4.9 | 1 | 0.1% |
| 5 | 6 | |
| 5.1 | 4 | |
| 5.2 | 4 | |
| 5.3 | 4 | |
| 5.4 | 5 | |
| 5.5 | 1 | 0.1% |
| 5.6 | 8 |
| Value | Count | Frequency (%) |
| 15.9 | 1 | 0.1% |
| 15.6 | 2 | |
| 14.3 | 1 | 0.1% |
| 14 | 1 | 0.1% |
| 13.8 | 1 | 0.1% |
| 13.5 | 1 | 0.1% |
| 13.4 | 1 | 0.1% |
| 13.3 | 3 | |
| 13.2 | 1 | 0.1% |
| 13 | 1 | 0.1% |
| Distinct | 139 |
|---|---|
| Distinct (%) | 12.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5311720808 |
| Minimum | 0.16 |
|---|---|
| Maximum | 1.58 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 0.16 |
|---|---|
| 5-th percentile | 0.27 |
| Q1 | 0.39 |
| median | 0.52 |
| Q3 | 0.645 |
| 95-th percentile | 0.8705 |
| Maximum | 1.58 |
| Range | 1.42 |
| Interquartile range (IQR) | 0.255 |
Descriptive statistics
| Standard deviation | 0.1882786942 |
|---|---|
| Coefficient of variation (CV) | 0.3544589427 |
| Kurtosis | 1.240553462 |
| Mean | 0.5311720808 |
| Median Absolute Deviation (MAD) | 0.13 |
| Skewness | 0.7776814335 |
| Sum | 605.005 |
| Variance | 0.03544886667 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.58 | 34 | 3.0% |
| 0.4 | 33 | 2.9% |
| 0.5 | 29 | 2.5% |
| 0.42 | 29 | 2.5% |
| 0.38 | 27 | 2.4% |
| 0.39 | 27 | 2.4% |
| 0.31 | 26 | 2.3% |
| 0.49 | 25 | 2.2% |
| 0.34 | 24 | 2.1% |
| 0.56 | 24 | 2.1% |
| Other values (129) | 861 |
| Value | Count | Frequency (%) |
| 0.16 | 2 | 0.2% |
| 0.18 | 5 | |
| 0.19 | 2 | 0.2% |
| 0.2 | 3 | 0.3% |
| 0.21 | 4 | |
| 0.22 | 4 | |
| 0.23 | 5 | |
| 0.24 | 9 | |
| 0.25 | 7 | |
| 0.26 | 9 |
| Value | Count | Frequency (%) |
| 1.58 | 1 | 0.1% |
| 1.33 | 2 | |
| 1.24 | 1 | 0.1% |
| 1.185 | 1 | 0.1% |
| 1.18 | 1 | 0.1% |
| 1.13 | 1 | 0.1% |
| 1.115 | 1 | 0.1% |
| 1.09 | 1 | 0.1% |
| 1.07 | 1 | 0.1% |
| 1.04 | 3 |
| Distinct | 80 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.273476734 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 106 |
| Zeros (%) | 9.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.095 |
| median | 0.26 |
| Q3 | 0.43 |
| 95-th percentile | 0.6 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.335 |
Descriptive statistics
| Standard deviation | 0.1963704762 |
|---|---|
| Coefficient of variation (CV) | 0.7180518552 |
| Kurtosis | -0.7937036894 |
| Mean | 0.273476734 |
| Median Absolute Deviation (MAD) | 0.17 |
| Skewness | 0.304091174 |
| Sum | 311.49 |
| Variance | 0.03856136391 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 106 | 9.3% |
| 0.49 | 50 | 4.4% |
| 0.24 | 33 | 2.9% |
| 0.08 | 31 | 2.7% |
| 0.02 | 29 | 2.5% |
| 0.4 | 25 | 2.2% |
| 0.1 | 24 | 2.1% |
| 0.26 | 23 | 2.0% |
| 0.31 | 22 | 1.9% |
| 0.42 | 22 | 1.9% |
| Other values (70) | 774 |
| Value | Count | Frequency (%) |
| 0 | 106 | |
| 0.01 | 17 | 1.5% |
| 0.02 | 29 | 2.5% |
| 0.03 | 18 | 1.6% |
| 0.04 | 19 | 1.7% |
| 0.05 | 16 | 1.4% |
| 0.06 | 16 | 1.4% |
| 0.07 | 13 | 1.1% |
| 0.08 | 31 | 2.7% |
| 0.09 | 20 | 1.8% |
| Value | Count | Frequency (%) |
| 1 | 1 | 0.1% |
| 0.79 | 1 | 0.1% |
| 0.78 | 1 | 0.1% |
| 0.76 | 3 | |
| 0.75 | 1 | 0.1% |
| 0.74 | 2 | |
| 0.73 | 1 | 0.1% |
| 0.72 | 1 | 0.1% |
| 0.71 | 1 | 0.1% |
| 0.7 | 2 |
residual sugar
Real number (ℝ≥0)
| Distinct | 82 |
|---|---|
| Distinct (%) | 7.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.493634767 |
| Minimum | 1.2 |
|---|---|
| Maximum | 15.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 1.2 |
|---|---|
| 5-th percentile | 1.6 |
| Q1 | 1.9 |
| median | 2.2 |
| Q3 | 2.6 |
| 95-th percentile | 4.51 |
| Maximum | 15.5 |
| Range | 14.3 |
| Interquartile range (IQR) | 0.7 |
Descriptive statistics
| Standard deviation | 1.257423074 |
|---|---|
| Coefficient of variation (CV) | 0.504253105 |
| Kurtosis | 30.11981457 |
| Mean | 2.493634767 |
| Median Absolute Deviation (MAD) | 0.4 |
| Skewness | 4.517630559 |
| Sum | 2840.25 |
| Variance | 1.581112787 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2 | 114 | 10.0% |
| 2.2 | 89 | 7.8% |
| 1.8 | 89 | 7.8% |
| 2.1 | 83 | 7.3% |
| 1.9 | 78 | 6.8% |
| 2.5 | 65 | 5.7% |
| 2.3 | 64 | 5.6% |
| 2.6 | 63 | 5.5% |
| 2.4 | 62 | 5.4% |
| 1.6 | 54 | 4.7% |
| Other values (72) | 378 |
| Value | Count | Frequency (%) |
| 1.2 | 6 | 0.5% |
| 1.3 | 5 | 0.4% |
| 1.4 | 24 | 2.1% |
| 1.5 | 21 | 1.8% |
| 1.6 | 54 | |
| 1.65 | 2 | 0.2% |
| 1.7 | 52 | |
| 1.75 | 2 | 0.2% |
| 1.8 | 89 | |
| 1.9 | 78 |
| Value | Count | Frequency (%) |
| 15.5 | 1 | 0.1% |
| 13.9 | 1 | 0.1% |
| 13.4 | 1 | 0.1% |
| 12.9 | 1 | 0.1% |
| 10.7 | 1 | 0.1% |
| 9 | 1 | 0.1% |
| 8.9 | 1 | 0.1% |
| 8.6 | 1 | 0.1% |
| 8.3 | 3 | |
| 7.9 | 1 | 0.1% |
| Distinct | 150 |
|---|---|
| Distinct (%) | 13.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.08889727831 |
| Minimum | 0.034 |
|---|---|
| Maximum | 0.611 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 0.034 |
|---|---|
| 5-th percentile | 0.052 |
| Q1 | 0.069 |
| median | 0.079 |
| Q3 | 0.091 |
| 95-th percentile | 0.1521 |
| Maximum | 0.611 |
| Range | 0.577 |
| Interquartile range (IQR) | 0.022 |
Descriptive statistics
| Standard deviation | 0.05205890085 |
|---|---|
| Coefficient of variation (CV) | 0.5856073643 |
| Kurtosis | 35.43490675 |
| Mean | 0.08889727831 |
| Median Absolute Deviation (MAD) | 0.011 |
| Skewness | 5.310015644 |
| Sum | 101.254 |
| Variance | 0.002710129158 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.078 | 37 | 3.2% |
| 0.08 | 37 | 3.2% |
| 0.079 | 35 | 3.1% |
| 0.084 | 33 | 2.9% |
| 0.074 | 33 | 2.9% |
| 0.075 | 30 | 2.6% |
| 0.082 | 30 | 2.6% |
| 0.07 | 29 | 2.5% |
| 0.076 | 29 | 2.5% |
| 0.081 | 29 | 2.5% |
| Other values (140) | 817 |
| Value | Count | Frequency (%) |
| 0.034 | 1 | 0.1% |
| 0.038 | 2 | 0.2% |
| 0.039 | 4 | |
| 0.041 | 4 | |
| 0.042 | 3 | |
| 0.043 | 1 | 0.1% |
| 0.044 | 5 | |
| 0.045 | 4 | |
| 0.046 | 4 | |
| 0.047 | 2 | 0.2% |
| Value | Count | Frequency (%) |
| 0.611 | 1 | |
| 0.61 | 1 | |
| 0.467 | 1 | |
| 0.464 | 1 | |
| 0.422 | 1 | |
| 0.415 | 1 | |
| 0.414 | 2 | |
| 0.413 | 1 | |
| 0.403 | 1 | |
| 0.401 | 1 |
free sulfur dioxide
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 58 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.92976295 |
| Minimum | 1 |
|---|---|
| Maximum | 72 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 7 |
| median | 14 |
| Q3 | 21 |
| 95-th percentile | 35 |
| Maximum | 72 |
| Range | 71 |
| Interquartile range (IQR) | 14 |
Descriptive statistics
| Standard deviation | 10.39257462 |
|---|---|
| Coefficient of variation (CV) | 0.6523998287 |
| Kurtosis | 1.717221049 |
| Mean | 15.92976295 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 1.194146708 |
| Sum | 18144 |
| Variance | 108.0056072 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6 | 104 | 9.1% |
| 5 | 72 | 6.3% |
| 12 | 55 | 4.8% |
| 15 | 54 | 4.7% |
| 7 | 51 | 4.5% |
| 10 | 48 | 4.2% |
| 9 | 48 | 4.2% |
| 16 | 46 | 4.0% |
| 13 | 41 | 3.6% |
| 17 | 41 | 3.6% |
| Other values (48) | 579 |
| Value | Count | Frequency (%) |
| 1 | 1 | 0.1% |
| 2 | 1 | 0.1% |
| 3 | 33 | 2.9% |
| 4 | 28 | 2.5% |
| 5 | 72 | |
| 5.5 | 1 | 0.1% |
| 6 | 104 | |
| 7 | 51 | |
| 8 | 39 | 3.4% |
| 9 | 48 |
| Value | Count | Frequency (%) |
| 72 | 1 | 0.1% |
| 66 | 1 | 0.1% |
| 57 | 1 | 0.1% |
| 54 | 1 | 0.1% |
| 53 | 1 | 0.1% |
| 52 | 3 | |
| 51 | 2 | |
| 50 | 2 | |
| 48 | 2 | |
| 47 | 1 | 0.1% |
total sulfur dioxide
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 140 |
|---|---|
| Distinct (%) | 12.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 47.15276558 |
| Minimum | 6 |
|---|---|
| Maximum | 289 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 6 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 22 |
| median | 38 |
| Q3 | 63 |
| 95-th percentile | 114.1 |
| Maximum | 289 |
| Range | 283 |
| Interquartile range (IQR) | 41 |
Descriptive statistics
| Standard deviation | 33.99345304 |
|---|---|
| Coefficient of variation (CV) | 0.7209217236 |
| Kurtosis | 4.350239831 |
| Mean | 47.15276558 |
| Median Absolute Deviation (MAD) | 19 |
| Skewness | 1.584630373 |
| Sum | 53707 |
| Variance | 1155.55485 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 24 | 28 | 2.5% |
| 14 | 27 | 2.4% |
| 28 | 27 | 2.4% |
| 20 | 27 | 2.4% |
| 19 | 25 | 2.2% |
| 12 | 23 | 2.0% |
| 16 | 22 | 1.9% |
| 13 | 22 | 1.9% |
| 26 | 21 | 1.8% |
| 18 | 21 | 1.8% |
| Other values (130) | 896 |
| Value | Count | Frequency (%) |
| 6 | 1 | 0.1% |
| 7 | 4 | 0.4% |
| 8 | 8 | 0.7% |
| 9 | 12 | |
| 10 | 19 | |
| 11 | 18 | |
| 12 | 23 | |
| 13 | 22 | |
| 14 | 27 | |
| 15 | 21 |
| Value | Count | Frequency (%) |
| 289 | 1 | |
| 278 | 1 | |
| 165 | 1 | |
| 160 | 1 | |
| 155 | 1 | |
| 153 | 1 | |
| 152 | 1 | |
| 151 | 2 | |
| 149 | 1 | |
| 148 | 2 |
| Distinct | 400 |
|---|---|
| Distinct (%) | 35.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9966482616 |
| Minimum | 0.9902 |
|---|---|
| Maximum | 1.0032 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 0.9902 |
|---|---|
| 5-th percentile | 0.99354 |
| Q1 | 0.99554 |
| median | 0.99666 |
| Q3 | 0.9978 |
| 95-th percentile | 0.9998 |
| Maximum | 1.0032 |
| Range | 0.013 |
| Interquartile range (IQR) | 0.00226 |
Descriptive statistics
| Standard deviation | 0.001834517257 |
|---|---|
| Coefficient of variation (CV) | 0.001840686757 |
| Kurtosis | 0.6544878003 |
| Mean | 0.9966482616 |
| Median Absolute Deviation (MAD) | 0.00114 |
| Skewness | 0.0114171626 |
| Sum | 1135.18237 |
| Variance | 3.365453566 × 10-6 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.9968 | 31 | 2.7% |
| 0.998 | 27 | 2.4% |
| 0.9976 | 25 | 2.2% |
| 0.9972 | 23 | 2.0% |
| 0.9964 | 19 | 1.7% |
| 0.9982 | 19 | 1.7% |
| 0.9962 | 18 | 1.6% |
| 0.997 | 18 | 1.6% |
| 0.9978 | 18 | 1.6% |
| 0.9966 | 17 | 1.5% |
| Other values (390) | 924 |
| Value | Count | Frequency (%) |
| 0.9902 | 1 | |
| 0.9908 | 1 | |
| 0.99084 | 1 | |
| 0.9912 | 1 | |
| 0.9915 | 1 | |
| 0.99154 | 1 | |
| 0.99157 | 1 | |
| 0.99162 | 1 | |
| 0.9917 | 1 | |
| 0.99182 | 2 |
| Value | Count | Frequency (%) |
| 1.0032 | 1 | 0.1% |
| 1.00315 | 1 | 0.1% |
| 1.00289 | 1 | 0.1% |
| 1.0026 | 2 | 0.2% |
| 1.0018 | 1 | 0.1% |
| 1.0014 | 4 | |
| 1.001 | 6 | |
| 1.0008 | 3 | |
| 1.0006 | 4 | |
| 1.0004 | 5 |
| Distinct | 87 |
|---|---|
| Distinct (%) | 7.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.308902546 |
| Minimum | 2.74 |
|---|---|
| Maximum | 4.01 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 2.74 |
|---|---|
| 5-th percentile | 3.06 |
| Q1 | 3.21 |
| median | 3.31 |
| Q3 | 3.4 |
| 95-th percentile | 3.57 |
| Maximum | 4.01 |
| Range | 1.27 |
| Interquartile range (IQR) | 0.19 |
Descriptive statistics
| Standard deviation | 0.1551792664 |
|---|---|
| Coefficient of variation (CV) | 0.0468975028 |
| Kurtosis | 0.9968547701 |
| Mean | 3.308902546 |
| Median Absolute Deviation (MAD) | 0.1 |
| Skewness | 0.2736671599 |
| Sum | 3768.84 |
| Variance | 0.02408060473 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3.26 | 37 | 3.2% |
| 3.3 | 37 | 3.2% |
| 3.34 | 37 | 3.2% |
| 3.31 | 35 | 3.1% |
| 3.38 | 35 | 3.1% |
| 3.32 | 35 | 3.1% |
| 3.39 | 33 | 2.9% |
| 3.28 | 33 | 2.9% |
| 3.36 | 31 | 2.7% |
| 3.22 | 31 | 2.7% |
| Other values (77) | 795 |
| Value | Count | Frequency (%) |
| 2.74 | 1 | 0.1% |
| 2.86 | 1 | 0.1% |
| 2.87 | 1 | 0.1% |
| 2.88 | 2 | 0.2% |
| 2.9 | 1 | 0.1% |
| 2.92 | 2 | 0.2% |
| 2.93 | 1 | 0.1% |
| 2.94 | 2 | 0.2% |
| 2.95 | 1 | 0.1% |
| 2.98 | 5 |
| Value | Count | Frequency (%) |
| 4.01 | 2 | |
| 3.9 | 2 | |
| 3.85 | 1 | 0.1% |
| 3.78 | 2 | |
| 3.75 | 1 | 0.1% |
| 3.74 | 1 | 0.1% |
| 3.72 | 1 | 0.1% |
| 3.71 | 2 | |
| 3.7 | 1 | 0.1% |
| 3.68 | 3 |
| Distinct | 93 |
|---|---|
| Distinct (%) | 8.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6593064091 |
| Minimum | 0.33 |
|---|---|
| Maximum | 2 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 0.33 |
|---|---|
| 5-th percentile | 0.48 |
| Q1 | 0.55 |
| median | 0.62 |
| Q3 | 0.72 |
| 95-th percentile | 0.961 |
| Maximum | 2 |
| Range | 1.67 |
| Interquartile range (IQR) | 0.17 |
Descriptive statistics
| Standard deviation | 0.171697695 |
|---|---|
| Coefficient of variation (CV) | 0.2604216987 |
| Kurtosis | 10.24425283 |
| Mean | 0.6593064091 |
| Median Absolute Deviation (MAD) | 0.08 |
| Skewness | 2.366662472 |
| Sum | 750.95 |
| Variance | 0.02948009847 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.54 | 52 | 4.6% |
| 0.6 | 48 | 4.2% |
| 0.58 | 47 | 4.1% |
| 0.62 | 46 | 4.0% |
| 0.56 | 44 | 3.9% |
| 0.53 | 41 | 3.6% |
| 0.57 | 41 | 3.6% |
| 0.61 | 37 | 3.2% |
| 0.59 | 37 | 3.2% |
| 0.55 | 35 | 3.1% |
| Other values (83) | 711 |
| Value | Count | Frequency (%) |
| 0.33 | 1 | 0.1% |
| 0.37 | 2 | 0.2% |
| 0.4 | 2 | 0.2% |
| 0.42 | 3 | 0.3% |
| 0.43 | 8 | 0.7% |
| 0.44 | 8 | 0.7% |
| 0.45 | 7 | 0.6% |
| 0.46 | 10 | 0.9% |
| 0.47 | 15 | |
| 0.48 | 27 |
| Value | Count | Frequency (%) |
| 2 | 1 | 0.1% |
| 1.98 | 1 | 0.1% |
| 1.62 | 1 | 0.1% |
| 1.61 | 1 | 0.1% |
| 1.59 | 1 | 0.1% |
| 1.56 | 1 | 0.1% |
| 1.36 | 3 | |
| 1.34 | 1 | 0.1% |
| 1.33 | 1 | 0.1% |
| 1.31 | 1 | 0.1% |
| Distinct | 63 |
|---|---|
| Distinct (%) | 5.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.44186421 |
| Minimum | 8.4 |
|---|---|
| Maximum | 14.9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 8.4 |
|---|---|
| 5-th percentile | 9.2 |
| Q1 | 9.5 |
| median | 10.2 |
| Q3 | 11.2 |
| 95-th percentile | 12.5 |
| Maximum | 14.9 |
| Range | 6.5 |
| Interquartile range (IQR) | 1.7 |
Descriptive statistics
| Standard deviation | 1.099889774 |
|---|---|
| Coefficient of variation (CV) | 0.1053346177 |
| Kurtosis | 0.1194272189 |
| Mean | 10.44186421 |
| Median Absolute Deviation (MAD) | 0.8 |
| Skewness | 0.8595552817 |
| Sum | 11893.28333 |
| Variance | 1.209757516 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 9.5 | 86 | 7.6% |
| 9.4 | 81 | 7.1% |
| 9.2 | 58 | 5.1% |
| 10 | 55 | 4.8% |
| 9.3 | 53 | 4.7% |
| 9.8 | 49 | 4.3% |
| 10.5 | 41 | 3.6% |
| 9.7 | 40 | 3.5% |
| 9.6 | 39 | 3.4% |
| 10.2 | 38 | 3.3% |
| Other values (53) | 599 |
| Value | Count | Frequency (%) |
| 8.4 | 2 | 0.2% |
| 8.5 | 1 | 0.1% |
| 8.7 | 2 | 0.2% |
| 9 | 13 | 1.1% |
| 9.05 | 1 | 0.1% |
| 9.1 | 19 | 1.7% |
| 9.2 | 58 | |
| 9.233333333 | 1 | 0.1% |
| 9.25 | 1 | 0.1% |
| 9.3 | 53 |
| Value | Count | Frequency (%) |
| 14.9 | 1 | 0.1% |
| 14 | 5 | |
| 13.6 | 4 | |
| 13.56666667 | 1 | 0.1% |
| 13.5 | 1 | 0.1% |
| 13.4 | 3 | |
| 13.3 | 3 | |
| 13.2 | 1 | 0.1% |
| 13.1 | 2 | 0.2% |
| 13 | 4 |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.608428446 |
| Minimum | 3 |
|---|---|
| Maximum | 8 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.0 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 5 |
| median | 6 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 8 |
| Range | 5 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.8433337723 |
|---|---|
| Coefficient of variation (CV) | 0.1503689992 |
| Kurtosis | 0.3709331376 |
| Mean | 5.608428446 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.1631298923 |
| Sum | 6388 |
| Variance | 0.7112118514 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 5 | 480 | |
| 6 | 442 | |
| 7 | 138 | 12.1% |
| 4 | 53 | 4.7% |
| 8 | 16 | 1.4% |
| 3 | 10 | 0.9% |
| Value | Count | Frequency (%) |
| 3 | 10 | 0.9% |
| 4 | 53 | 4.7% |
| 5 | 480 | |
| 6 | 442 | |
| 7 | 138 | 12.1% |
| 8 | 16 | 1.4% |
| Value | Count | Frequency (%) |
| 8 | 16 | 1.4% |
| 7 | 138 | 12.1% |
| 6 | 442 | |
| 5 | 480 | |
| 4 | 53 | 4.7% |
| 3 | 10 | 0.9% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 7.8 | 0.880 | 0.00 | 2.6 | 0.098 | 25.0 | 67.0 | 0.9968 | 3.20 | 0.68 | 9.8 | 5 |
| 1 | 2 | 7.8 | 0.760 | 0.04 | 2.3 | 0.092 | 15.0 | 54.0 | 0.9970 | 3.26 | 0.65 | 9.8 | 5 |
| 2 | 3 | 11.2 | 0.280 | 0.56 | 1.9 | 0.075 | 17.0 | 60.0 | 0.9980 | 3.16 | 0.58 | 9.8 | 6 |
| 3 | 5 | 7.4 | 0.660 | 0.00 | 1.8 | 0.075 | 13.0 | 40.0 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
| 4 | 6 | 7.9 | 0.600 | 0.06 | 1.6 | 0.069 | 15.0 | 59.0 | 0.9964 | 3.30 | 0.46 | 9.4 | 5 |
| 5 | 7 | 7.3 | 0.650 | 0.00 | 1.2 | 0.065 | 15.0 | 21.0 | 0.9946 | 3.39 | 0.47 | 10.0 | 7 |
| 6 | 8 | 7.8 | 0.580 | 0.02 | 2.0 | 0.073 | 9.0 | 18.0 | 0.9968 | 3.36 | 0.57 | 9.5 | 7 |
| 7 | 10 | 6.7 | 0.580 | 0.08 | 1.8 | 0.097 | 15.0 | 65.0 | 0.9959 | 3.28 | 0.54 | 9.2 | 5 |
| 8 | 12 | 5.6 | 0.615 | 0.00 | 1.6 | 0.089 | 16.0 | 59.0 | 0.9943 | 3.58 | 0.52 | 9.9 | 5 |
| 9 | 13 | 7.8 | 0.610 | 0.29 | 1.6 | 0.114 | 9.0 | 29.0 | 0.9974 | 3.26 | 1.56 | 9.1 | 5 |
Last rows
| df_index | fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1129 | 1587 | 5.8 | 0.610 | 0.11 | 1.8 | 0.066 | 18.0 | 28.0 | 0.99483 | 3.55 | 0.66 | 10.9 | 6 |
| 1130 | 1588 | 7.2 | 0.660 | 0.33 | 2.5 | 0.068 | 34.0 | 102.0 | 0.99414 | 3.27 | 0.78 | 12.8 | 6 |
| 1131 | 1589 | 6.6 | 0.725 | 0.20 | 7.8 | 0.073 | 29.0 | 79.0 | 0.99770 | 3.29 | 0.54 | 9.2 | 5 |
| 1132 | 1590 | 6.3 | 0.550 | 0.15 | 1.8 | 0.077 | 26.0 | 35.0 | 0.99314 | 3.32 | 0.82 | 11.6 | 6 |
| 1133 | 1591 | 5.4 | 0.740 | 0.09 | 1.7 | 0.089 | 16.0 | 26.0 | 0.99402 | 3.67 | 0.56 | 11.6 | 6 |
| 1134 | 1593 | 6.8 | 0.620 | 0.08 | 1.9 | 0.068 | 28.0 | 38.0 | 0.99651 | 3.42 | 0.82 | 9.5 | 6 |
| 1135 | 1594 | 6.2 | 0.600 | 0.08 | 2.0 | 0.090 | 32.0 | 44.0 | 0.99490 | 3.45 | 0.58 | 10.5 | 5 |
| 1136 | 1595 | 5.9 | 0.550 | 0.10 | 2.2 | 0.062 | 39.0 | 51.0 | 0.99512 | 3.52 | 0.76 | 11.2 | 6 |
| 1137 | 1597 | 5.9 | 0.645 | 0.12 | 2.0 | 0.075 | 32.0 | 44.0 | 0.99547 | 3.57 | 0.71 | 10.2 | 5 |
| 1138 | 1598 | 6.0 | 0.310 | 0.47 | 3.6 | 0.067 | 18.0 | 42.0 | 0.99549 | 3.39 | 0.66 | 11.0 | 6 |